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Abstract. The reconstruction problem on the tree has been studied 
in numerous contexts including statistical physics, information theory 
and computational biology. However, rigorous reconstruction thresholds 
have only been established in a small number of models. We prove the 
first exact reconstruction threshold in a non-binary model establishing 
the Kesten-Stigum bound for the 3-state Potts model on regular trees of 
large degree. We further establish that the Kesten-Stigum bound is not 
tight for the g-state Potts model when q > 5. Moreover, we determine 
asymptotics for the reconstruction thresholds. 



1. Introduction 

1.1. Preliminaries. We begin by giving a general description of broadcast 
(or Markov) models on trees and the reconstruction problem. The broadcast 
model on a tree T is a model in which information is sent from the root p 
across the edges, which act as noisy channels, to the leaves of T. For some 
given finite set of characters C a configuration on T is an element of C'^, 
that is an assignment of a character C to each vertex. We will denote the 
elements of C as {1, . . . ,q} and q = \C\ as the number of characters. The 
broadcast model is a probability distribution on configurations defined as 
follows. Some \C\ x \C\ probability transition matrix M is chosen as the 
noisy channel on each edge. The spin cjp is chosen from C according to some 
initial distribution and is then is propagated along the edges of the tree 
according to the transition matrix M. That is if vertex u is the parent of v 
in the tree then the spin at v is defined according to the probabilities 



PicTv = jWu = i) = M, 



The focus of this paper is on the symmetric channel which are given by 
transition matrices of the form 



l-p if i= j, 

otherwise, 



where < p < 1. The state of the root is chosen according to the uniform 
distribution on C. 
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The symmetric channel corresponds to the g-state Potts model on the 
tree. The Potts model weights configurations according to the Hamiltonian 
^i^) — '}2(uv)eE l{cr„=o-„} which counts the number of edges in which the 
characters on each side are equal. On a finite tree the probability distribution 
is given by 



where Z is & normalising constant. On an infinite tree more than one Gibbs 
measure may exist, the symmetric channel corresponds to the free Gibbs 



measure. The two models coincide when 1— p = ttt— r- It will be convenient 
to parameterise the symmetric channel by its second largest eigenvalue by 
absolute value (that is either the second eigenvalue or the last eigenvalue, 
whichever is larger). It is given by 

^ ' q-1 el^ + q-l 

and takes values in the interval [— !)• The special case of proper colour- 
ings corresponds to A = — ^tj- In line with the terminology for the Potts 
model we will say the channel is ferromagnetic when A > and anti- 
ferromagnetic when A < 0. 

We will restrict our attention to d-ary trees, that is the infinite rooted tree 
where every vertex has d offspring. Let o"(n) denote the spins at distance n 
from the root and let cr*(n) denote cr(n) conditioned on cTp = i. 

Definition 1. We say that a model is reconstructihle on a tree T if for some 
hi G c, 

lim sup dry (c* (n), cr-'(n)) > 

n 

where dxy is the total variation distance. When the limsup is we will say 
the model has non-reconstruction on T. 

Non-reconstruction is equivalent to the mutual information between Up = 
a"(0) and fT(n) going to as n goes to infinity and also to {cr(n)}^]^ having 
a trivial tail sigma- field. In terms of Gibbs measures non-reconstruction is 
equivalent to the free measure being extremal, that is not a convex combi- 
nation of two other Gibbs measures. More equivalent formulations are given 
in [16] Proposition 2.1. In contrast consider the uniqueness property of a 
Gibbs measure. 

Definition 2. We say that a model has uniqueness on a tree T if 

limsup sup dTV\P{(^p = = ■^)^P{^p = = B) \ > 

n A,B ^ ^ 
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where the supremum is over ah configurations A,B on the vertices at dis- 
tance n from the root. 

Reconstruction implies non-uniqueness and is a strictly stronger condi- 
tion. Essentially uniqueness says that there is some configuration on the 
leaves which provides information on the root while reconstruction says that 
a typical configuration on the leaves provides information on the root. 

1.2. Background. For a given parameterized collection of models the key 
question in studying reconstruction is finding which models have reconstruc- 
tion, which typically involves finding a threshold. The reconstruction prob- 
lem naturally arises in biology, information theory and statistical physics and 
involves the trade off between increasing numbers of leaves with increasingly 
noisy information as the distance from the root to the leaves increases. In 
the case of the Potts model this is the question of for which A is there recon- 
struction for each choice of q and d. Proposition 12 of [14j implies that for 
each q and d there exist A~ < < A"*" such that there is non-reconstruction 
when A G (— A~,A"*") and reconstruction when A S [— A~) U (A"*", 1). The 
result does not say what happens when A G {A^, A+}. 

The most general result on reconstruction is the Kesten-Stigum bound 
|10 j which says that reconstruction holds when X^d > 1 which in our param- 
eterisation says that A+ < and A" > -d'^^'^. In fact when dA^ > 1 

it is possible to asymptotically reconstruct the root from just knowing the 
number of times each character appears on the leaves (census reconstruction) 
without using the information on their positions on the leaves. 

The simplest collection of models is the binary (2-state) symmetric chan- 
nel which is defined on two characters and corresponds to the Ising model on 
the tree with no external field. It was shown in |3] and [9] that this channel 
has reconstruction if and only if dA^ > 1, that is the Kesten-Stigum bound 
is sharp. Before this paper exact reconstruction thresholds had only been 
calculated in the binary symmetric channel and binary asymmetric channels 
with sufficiently small asymmetry [3| where the Kesten-Stigum is also sharp. 
Mossel [m [16] showed that the Kesten-Stigum bound is not the bound for 
reconstruction in the binary-asymmetric model with sufficiently large asym- 
metry or in the ferromagnetic Potts model with q > 18. For general Potts 
models [iTj showed non-reonstruction when 



2+{q-2)X- 

and these bounds were improved in ^12j. Several recent results deal with 
the special case of proper colourings which is now known to good accuracy. 
By analysing a simple reconstruction algorithm reconstruction was shown 
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to hold when d > q[logq + log log g + 1 + o(l)] see [20]. The tightest 
bounds for non-reconstruction are d < q\logq + log log g + 1 — log 2 + o(l)] 
established by [21], the difference between the upper and lower bounds is 
just qlog 2. 

Using techniques from statistical physics and including numerical sim- 
ulations Mezard and Montanari [13] made a series of conjectures for the 
symmetric channels. 

Conjecture 1 ([13j). The Kesten-Stigum bound is tight for the ferromag- 
netic symmetric channel when q < 4 and is not tight when q > 5. In the 
anti-ferromagnetic model the Kesten-Stigum bound is tight when q < 3 and 
not tight when q > 4. 

As this conjecture was based on numerical evidence they qualified it by 
stating that it might not hold for large d. This paper confirms much of the 
predicted picture. 

1.3. Main Results. Our results confirm much of the picture predicted by 
Mezard and Montanari [13]. We give a complete picture for large d except 
in the case of g = 4 which the proof will show is a critical case. The g = 4 
case will be dealt with in a subsequent paper. 

Theorem 1.1. When q = 3 there exists a dmin such that for d > d^^in the 
Kesten-Stigum bound is sharp for both the ferromagnetic and antiferrmag- 
netic channels, that is X~^{d) = d'^^"^ and A~(d) = —d'^^"^. Furthermore 
there is non-reconstruction at the Kesten-Stigum bound, when A = A"*" or 
A = A-. 

Conversely when q > 5 the Kesten-Stigum bound is never sharp. 

Theorem 1.2. When q > 5 for every d the Kesten-Stigum bound is not 
sharp, that is A+ < d^^/"^ and A^ > — d^^/^. 

1.3.1. Asymptotic results. When the Kesten-Stigum bound is not sharp we 
are not able to exactly compute the threshold, doing so involves finding a 
non-trivial fixed point of an equation of vector- valued distributions. Nonethe- 
less we are able to give precise asymptotics for the thresholds for fixed q and 
d goes to infintiy. In light of the Kesten-Stigum bound it makes sense to 
consider d^^^A^. When q > 5 the limit is strictly different from 1. 

Theorem 1.3. When q > 5, 

lim = Cg 

d—*oo 

lim d^/^X~- = -Cq 
where Cg is a constant strictly less than 1. 
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Of course when q = 3 we have that d}^'^X^ = ±1 for large d. 

1.4. Applications. The broadcast model is a natural model for the evo- 
lution of characters of DNA. In phylogenetic reconstruction the goal is to 
reconstruct the ancestry tree of a collection of species given their genetic 
data. Establishing a conjecture of Mike Steel it was shown that the num- 
ber of samples required for phylogenetic reconstruction undergoes a phase 
transition at the reconstruction threshold for the binary symmetric channel 

mm- 

The reconstruction threshold on trees is believed to play a critical role 
in the dynamic phase transitions in certain glassy systems given by ran- 
dom constraint satisfaction problems such as random K-SAT and the anti- 
ferromagnetic Potts model on random graphs. We will briefly describe the 
broad picture conjectured by physicists about such systems [HI [22], gen- 
erally without rigorous proof, and why understanding the reconstruction 
threshold for colourings plays an important role in such systems. The the- 
ory relates to the structure and connectivity of the set of configurations 
which support most of the measure of the distribution, with the topology 
given by the hamming distance on the space of configurations. 

At "high temperatures" or low densities of constraints the Gibbs mea- 
sure places all but an exponentially small fraction of its weight in a sin- 
gle "connected cluster" . As the temperature decreases there is a threshold 
called the "dynamical replica symmetry breaking threshold" at which the 
set supporting most of the measure splits into exponentially many smaller 
clusters. The clusters are each well separated from each other and con- 
tain an exponentially small amount of the measure but together contain 
all but an exponentially small amount of the measure. This threshold is 
believed to correspond to the reconstruction threshold on the correspond- 
ing tree model. In a recent result [1] rigorously proved that for random 
colourings on Erdos-Renyi random graphs with average connectivity d when 
(1 + o{l))qlogq < d < {2 — o{l))qlogq the space of solutions indeed breaks 
into exponentially many small clusters. The lower bound corresponds to the 
reconstruction threshold for colourings on the tree [2Tj . 

Another threshold, the condensation threshold, is believed to occur at 
even lower temperatures. At this point clusters exist with a positive fraction 
of the measure, these masses are believed to be jointly given by a Poisson- 
Dirichlet distribution. When the Kesten-Stigum bound is tight these thresh- 
olds coincide and there is no phase where the clusters all have a small pro- 
portion of the mass. 

The reconstruction threshold is also believed to play an important role in 
the efficiency of the Glauber dynamics on trees and random graphs. In [2] 
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it was shown that the mixing time for the the Glauber dynamics on trees 
is n^~^^W when the model has reconstruction and slower than at higher 
temperature when the mixing time is O(nlogn). In the case of the Ising 
model this is tight, the mixing time is 0(n log n) when dX^ < 1. 

Local MCMC algorithms are conjectured to be efficient up to the re- 
construction threshold for sampling random colourings on random graphs 
but experience an exponential slowdown beyond it [TT]. This is to be ex- 
pected since a local MCMC algorithm can not move between clusters each 
of which has exponentially small probability. Rigorous proofs of rapid mix- 
ing of MCMC algorithms, such as the Glauber dynamics, fall a long way 
behind. For colourings of random regular graphs, results of [8] imply rapid 
mixing when q > lAdd, well below the reconstruction threshold and even the 
uniqueness threshold. Even less is known for Erdos-Renyi random graphs as 
almost all MCMC results are given in terms of the maximum degree which 
in this case grows with n. Polynomial time mixing of the Glauber dynamics 
has been shown [18] for a constant number of colours in terms of d, the 
average connectivity. 

1.5. Proof Sketch. The proof analyses a quantity denoted by x„. One 
interpretation of Xn is that if we guess the value of ap according to its 
postier distribution given cr(n) then x„ is the probability of being correct 
minus |, which is the chance of being correct by simply guessing randomly. 
More formally if Z is a C-valued random variable with distribution given by 
P{Z = i I (T(n)) = -P(o"p = i I a{n)) then x„ = P{Z = ap) — i. Our analysis 
is similar to the expansion of [5] but with more precise estimates derived 
by establishing concentration results. Such expansions go back to [6J in the 
context of spin- glasses. 

We show that x„ is always positive and that non-reconstruction is equiv- 
alent to 

lim Xn = 0. 

In general finding the recnostruction threshold requires understanding recur- 
sive equations of vector- valued distributions c.f. [13]. However, when Xn, the 
amount of information about the between the leaves and the root, is small 
and the equations become close to linear. Using Taylor series expansions 
and concentration estimates establishes that for small x„ 

xn+i = dX'xn + (1 + o(l))^^^fci^A^x2. (1.1) 

A key role is played by the sign oi q — 4. When g > 5 it is positive and 
this allows us to show that if dX'^ is sufficienty close to 1 then x„ does not 
converge to and hence there is reconstruction beyond the Kesten-Stigum 
bound. 
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However, when q = 3 the second order term is negative. Suppose we 
could estabhsh that a;„ is eventuahy smah when dA^ < 1. Then equation 
(jl.ip imphes that Xn converges to which estabhshes non-reconstruction. 
Unfortunately for small d we are not able to show that Xn becomes suffi- 
ciently small to apply this argument. 

When d is large the interactions between spins become very weak but 
there are many of them. Using the Central Limit Theorem we approximate 
this collection of small independent interactions to show that 



for some increasing function gg. When q = 3 for all < s < 1 the function 
satisfies gsis) < s. Using this estimate for large enough d it is established 
that Xn become arbitrarily small. Combining this with equation (jl.ip proves 
non-reconstruction for large enough d. When g = 4 for all < s < 1 the 
function also satisfies 54(5) < s while when q > 5 the equation 95(5) = s has 
nonzero solutions. The function gq{s) determines the limiting value of x„, a 
consequence of which is Theorem 11.31 



We introduce the notation we use in the proofs. We denote the colours 
by C = {1, . . . , g} and let T be the d-ary tree rooted at p. Let ui, . . . ,Ud be 
the children of p and for a vertex v G T let denote the subtree of descen- 
dants of V (including v). Throughout the paper we will use the convention 
that i will denote an element of C and j will be an element of {1,. . . ,d} 
corresponding to a child of p. Let a denote a random configuration given 
by the symmetric channel with transition matrix given by 



where < p < 1. Rather than looking at the unconditioned configura- 
tions a we will work mainly with configurations where the spin at the root 
is conditioned; we let cr* denote a random configuration according to the 



In light of the Kesten-Stigum bound we will always assume that dX^ <l. 

Let S{n) denote the vertices on level n, {v G T : d{v,p) = n}, let cr{n) := 
crs(n) denote the spins on S{n) and let (Jj{n) denote the spins in S{n) nT^^. . 



Xn+l ~ gq{d\^Xn) 



2. Proofs 
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For a configuration A on S{n) define tlie posterior function as 

fnii,A)= Piap = i\a{n)=A). 

By the recursive nature of tlic tree for a configuration A on S{n + 1) n T^^ 
we also have (with a slight abuse of notation) that 

fnii, A) = P{au, = i\aj{n + 1) = A). 

Now define Xi{n) = Xi by 

Xi{n) = fn{i,cr{n)). 

These random variables are a deterministic function of the random configu- 
ration o"(n) of the leaves which gives the posterior probability that the root 
is in state i. Recall that a collection of random variables are exchangeable 
if their distribution is invariant under permutations. By symmetry the Xj 
are exchangable. Now we define two random variables 

X+ = X+{n) = Uil,aHn)) 

and 

X- =X-{n)=fn{2,a\n)). 

We will establish non-reconstruction (respectively reconstruction) by show- 
ing that X+ and X~ both converge (resp. do not converge) to | in proba- 
bility as n goes to infinity. By symmetry we have 

d \X'^ ii = 12, 



/n(i2,^*Hn)) 



X otherwise, 



and the set {/„(i, (T^(n)) : 2 < z < g} is exchangeable. Moreover they are 
conditionally exchangeable given /„(1, (T''^(n)). 
Now define 

Y,j=Yij{n) = fn{i,a]{n + l)). 

This is none other than the posterior probability that a^j = i given the 
random configuration aj{n+l) on the spins in S{n)r\Tuj ■ Conditional on the 
spin at the root the spins in the subtrees T^^. are conditionally independent 
for j = 1, . . . , d. Taking advantage of this and the symmetries of the model 
the following proposition is immediate. 

Proposition 2.1. The Yij satisfy the following properties: 

• The random vectors Yj = (Yij, . . . ,Yqj) are independent for j = 
l,...,d. 

• Conditional on a^,. the random variable Y^^^.j is equal in distribution 
to X'^{n) while for i / the random variables Y^j are equal in 
distribution to X~ (n) . 
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• Further given andY^^^ j the random variables {Yij}i^(j. are con- 
ditionally exchangeable. 

The key method of this paper wih be to analyze the relation between the 
distributions X+(n) and X'^{n + 1) using the recursive structure of the tree. 
Suppose ^4 is a configuration on S{n + 1) and let Aj be its restriction to 
n S{n + 1). The following standard relation follows from the Markov 
random field property 

nil (Mufnih Aj) + Zi^, Mufnil, Aj 

/n+l(l,^)- ^ 



ELi n,=i [MMh A,) + Ma f nil, A, 

n-=l {Ml2 + (Mil - Mi2)/n(l,ylj)) 

ELi WU (^12 + (Mil - Mi2)/„(i, A,)) 

n?=l (l + MUn{^,A,)- - 



(2.2) 



where the second equality follows from the fact that Yli=i fn{h Aj) = 1 and 
the symmetry of M and the final equality follows from equation (j2.ip since 



Mi2 + - (Mil - M12) = M12 + - (1 - (g - l)Mi2 - M12) = - 
q q q 

and 

Mil - M12 = 1 - Q'Mi2 = A. 
Conditioning the root to be 1 and letting A = a^(n + 1) we have that 

X^irr + 1) = -^ (2.3) 

where 

Z, = Zi{n) = n (1 + HYijin) - -)) ■ (2.4) 
j=i ^ ^ ^ 

Equation (j2.3p will be our major tool for recursive analysing the reconstruc- 
tion problem. 

2.1. Basic Identities. Denote 

Xn = E{X+{n) --) = Efn{l,a\n)) - - 

q q 

and 

zn = E{X+{n) - -f = EiUh a\n)) - -f. 

q q 

As discussed in the introduction the main proof relies on analysing recursions 
of Xn- This is based on the approach of [5] used in the binary asymmetric 
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channel but with a more refined analysis, in particular establishing concen- 
tration of the random variables Xi. The following lemma, which can be 
viewed as the analogue of Lemma 1 of [5j , allows us to relate the first and 
second moments of X'^ . 

Lemma 2.2. The following relations hold: 

Xn + - = EX+=E j2iMn)f = E{X+{n)f + {q- l)E{X- {n))\ 
i=i 

and 

Xn = Ej2{X,{n) - -f = E{X+{n) - -f + {q - l)E{X-{n) - -f > z„. 

Proof. From the definition of conditional probabilities and of /„ and the fact 
that P{(Tp = 1) = I we have that 

EX+{n) = Efn{l^a\n)) 

= Y,fn{l,A)P{a{n)=A\ap = l) 
A 

_^ P{a{n)=A,ap = l) 

-\ pW^) ^"(^'^^ 

= qY.P{G{n)=A)fr.{\,Af 

A 

= qE{Xr{n)f 
= Ej2iMn)f 



i=l 



and 

k 



EY,{Xdn) - -f = EY,iMn)f - -EY^Mu) + - = EX+--. 

i=i ^ 1=1 ^ i=i ^ ^ 

Conditional on ap we have that X(j^[n) is distributed as X^{n) and f 
i ^ Gpwe have that Xi[n) is distributed as X^{n). It follows that 



EY,{X^{n)f = E{X+{n)f + {q - l)E{X- {n)f 

i=l 

and 

Ej^iX.in) - -f = E{X+{n) - -f + {q - l)E{X-{n) - - 
which completes the result. 
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Define d'p{n) to be the maximum hkehhood estimator of C7p given o"(n) 
which is given by 

(Tp(n) := argmaXjXj(n) 

where in the case that multiple states maximize the likelihood the, estima- 
tor chooses randomly between these states. This estimator maximizes the 
probability of correctly reconstructing the root. Define the probability of 
correct reconstruction as 

Pn ■■= P {(Jp = CFp{n)) = E max Xi{n) 

l<i<q 

This represents the probability of correctly reconstructing the spin at the 
root using the maximum likelihood estimator which maximizes the proba- 
bility of correctly determining the root. Since o"(n) is a Markov process pn 
is clearly decreasing. 

Lemma 2.3. We have that 

Proof. The inequality Xn + ^ < was shown in [13] by noting that the 
algorithm that chooses ap randomly according to probabilities Xi is correct 
with probability x„ + |. By the Cauchy- Schwartz inequality and Lemma 



Pn = E max Xi < — \- E max 
i q i 



< - + ( E'max ( Xi - - 



1 

2\ 2 



1 

2\ 2 



as required. ■ 
The following corollary of Lemmas 12.21 and 12.31 justifies our focus on Xn- 
Corollary 2.4. We have that x„ > and the condition 

limx„ = 0. 

n 

is equivalent to non-reconstruction. 

Proof. Lemma 12.21 implies that x„ > z„ > 0. By Lemma 12.21 Xn converging 
to is equivalent to 



i=l 

which is equivalent to the posteriors converging to the stationary distribution 
which is in turn equivalent to reconstruction |16j . ■ 
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Using the identities from Lemma 12.21 we calculate the means and covari- 
ances of the Yij. 

Lemma 2.5. For each I < j < q the following hold: 



E{Yij --)= XXn, E{Yij - -f = XZn + -(1 - X)Xn. 

q q q 



For i ^ 1 we have that 
1 , Xx„. 



E{Yij 



q 



q-l 



E{Yi, 



and 



E{Y,, - i)(y,, - i) 



-)' = -(! + 

q q q-l 

\_ _ 1-A 

9 - 1 " 9(9-1' 



When 1 < ii < ii < q, 



1 



{q-l){q-2) 



2Xzn - -(g-2 + 2A)x, 



(2.5) 

(2.6) 
(2.7) 

(2.8) 



Proof. By Proposition 12.11 if al^. = 1 then Yij is distributed according to 
X^(n) otherwise it is distributed according to (n). By equation ()2.ip we 
have that 



Noting that J2i=i Yij = 1 it follows that EX^{n) + {q- l)EX~{n) = 1 and 

i) = It follows that 

q' 9-1 



so E{X^{n 
1. 



1, 



E{Y^, --) = P{ai^ = l)E{X+{n) --) + (!- P(<. = l))E{X- {n) 
q Q 

_ 1 + A(g-1) , ^ 1 + A(g-1) 



Using Lemma |2. 21 and Proposition 12.11 we have that, 
E{Y,, 



-f = P{al^ = l)EiX+{n) - -f + (1 - P{al^ = l))i?(X-(n) - 

q ^ q q 



1 + X{q-1] 



1 



1 + A(g-1)\ 1 



q-l 



1, 



1, 



XZn + -(1 - X)Xn 

q 



E{X+{n) - -)-E{X+ - -f 
Q Q 

(2.9) 



which establishes equation (|2.5p . Now since = 1 and since by 

Proposition 12.11 we have that Y2j, . . . ,Ygj are exchangeable, for i / 1 we 
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have that 



1=2 



1 



Ax 77 

Again using Lemma [2. 21 and the exchangeabihty of l2j, . . . , Yqj we have that 



E{Yi 



1, 



1 



q-l 
1 

1 



-E{yij-\f + jZE{yii-\) 



1=1 



1 



'{XZn + -(1 - X)Xn) + Xn 



A 



1 9 — 1 



By the fact that EL2(^i " i) = "(^li " i)' 



A 



ii;(yi,-i)(i^,-i) 



«=2 ^ 



A 



1-A 



q-l q{q-l) 
where the third equahty fohows from equation (j2.9p . Finahy 



1 





-l)iq-2y 




1 




-l)(g-2) 











E 



1 



i=2 

{XZn + ^(1 - A)x„) 

A , A 



1- 



{q-l){q-2) 



1 



2Az„--(g-2 + 2A)x„ 
9 



2.2. Taylor Series Bounds. In the fohowing lemma we calculate expected 
values of monomials of the Zi by expanding them using Taylor series ap- 
proximations. 
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Lemma 2.6. For each positive integer k, there exists a C = C{q,k) not 
depending X or d such that for each < ki, . . . ,kq,< k, 



Efl^i' < C 



and 



E^lZ^' -l-diE\{(l + \q{Yii 



- 1 



< Cxi 



and 



Ef[z^^-l-d[E\{[l + \q{Yn - I 



<Cxt 



Proof. Recall that 



so each Zi is a product of independent and identically distributed terms and 
that 



E\[Z^^=[E\[[l + Xq{Y,,{n)-- 



1=1 



1=1 



1, 



As such we begin with a simple bound on (1 + yY using Taylor series. 
Suppose that d\y\ < C for some constant C > 0. Then we have that, 



1=0 ^ • 



i=i+l 



d' 



i=e+i 



.d\y\ 



{d\y\y 



i=0 

.C"|j,,|£+1 



(2.10) 



where the third inequality follows by Taylor's Theorem since maxx<c' ^^i+i = 
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Suppose that si,. . . ,Sq are nonnegative integers. If for some £, S£ > 2 
then since by definition < Yij < 1, by Lemma [ 

g 



i=l 



< E{Ya - -f < X. 



(2.11) 



If for distinct integers 



i=l 



< E 



< E 



se' = 1 then again by by Lemma 12.2 



(Yn--)iyi'i--) 

q q 



(Ye 



£1 



Finally if = 1 and Sj = for all i ^ I then by Lemma 12.51 



Efi{Y,,--r 



i=l 









EYa - - 
Q 



< \X\Xn. 



(2.12) 



(2.13) 



Then applying equations (ICTD . (|2T^ and (I233D . 

1 / 1 \ fcj 



i=l 



Ef[ll + \g{Y.i--)) ' -1 



(Sl,. ..,Sq) 1=1 



X'^q'' Y^ 



fji 



E±k.XqiY,-l)+ J2 Efl(^)x-q- 

i=l ^ (si,...,s,),E*»>2 i=l ^ 



Y 



il 



Si 



— C X Xyi 

where the sum runs over all g-tuples of nonegative integers (si, . . . , Sq) with 
Si < ki for all i and the constant C depends only on q and ki, . . . ,kq. The 
final inequality in the last equation follows from equations ()2.1ip , (j2.12p and 
(|2.13p since every term is bounded by C"}?Xn where C" depends only on q 
and k. Since < a;„ < 1 and A^d < 1 applying equation (j2.10p with 

y 



( 1 \ ' 

i?n(i+Ag(i^.i--)j -1 



completes the result. 



2.3. Main Expansion. In order to evaluate the expected value of SX"*"(n+ 
1) using equation (j2.3p we expand it out using the identity 



a ar r a 
+ 



s + r s s^s + r 



(2.14) 
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With this expansion and a = Zi, s = q and r = (Ylt=i ^i) ~ 1 clearly, 
Zi 1 



Xn+l = E 



Ya.=i Zi 



q g2 



+ E 



Zi ((ELi z^ 



SLi Zi 



1 



(2.15) 



We estimate the expected value of each of the terms in the preceding equa- 
tion. First 

EZi = l + dXqE{Yn - -) + ^^^^ ( XqE{Yn - -)]\ Ri 

q 2 V Q J 



(2.16) 



where by Lemma 12.61 the error term satisfies |i?i| < Cix^ where Ci does 
not depend on X,d or Xn- Next applying Lemma 12.61 and Lemma 12.51 and 
cancelling terms 

EZi ( ^ Z^-q]= EZl + EZiZi - qEZi 



,i=l 



1=2 



d{d-l) ^4^2 

n ^ 1 



((3 - X)Xn + XqZnY 



-\ —{{q - 3 + X)xn - Xqznf - qxl 



+ R2 (2.17) 



where by Lemma \TQ\ \R2\ < C2X^ and C2 does not depend on X,d or x„. 
Finally again using Lemma 12.61 and Lemma 12. 5t 

/ q \2 q q q q 

E[{Y^Zi)-q\ =EZf + Y,EZf + 2Y,EZiZ, + Y^ Yl ^^^Z' 



«2 



4=1 



i=2 



i=2 



il=2 42=41 + 1 



2qEZi -2qY ^Zi + q^ 



i=2 



djd-l) ,4^2 

— 7: — ^ q 



((3 - X)Xn + XqZnY 



H r ((^ - 3 + X)xn - Xqzn) - 2qxl 



q-1 



q-l 



+ 



{q-l){q-2) 



((3g - 6 - 2X)xn + 2XqZnY 



+ R3 
(2.18) 
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where by Lemma 12.61 |i?3| < C-^xf^ and C3 does not depend on X,d or Xn- 
By Lemma |2. 21 we have that < 2„ < a;„ and since |A| < 1 the expressions 
in equations p. 170 and ()2.18p are both bounded by A^x^ where C 

depends only on q. Now using the fact that < < 1 and substituting 
equations (|2.16p . ()2.17p and (j2.18p into equation (j2.15p we have that 

|x„+i - dX'xn\ < CgX^^^^j^xl < Cgxl (2.19) 

where Cg depends only on q since X^d < 1. In order to complete the proof we 
will need a more precise bound. To motivate the rest of the proof suppose 
that we could establish the following condition: 

Condition 2.7. Suppose the following holds: 

• That Zn = + o{l))xn, 

• That Y^/f^ is sufficiently concentrated around - so that 



If we established Condition 12.71 then by substituting equations (|2.16p . 
(j2.17p and (j2.18p into equation (j2.15p we would have that 

x^^, = dX^x^ + (1 + ^(^ -^^ ^4. (2.20) 

Proving Condition 12.71 is one of the main technical challenges in this paper. 

2.4. Concentration Lemmas. In this subsection we establish a number 
of lemmas in order to establish the Condition 12.71 The following lemma 
follows immediately from equation ()2.19p . 

Lemma 2.8. For any e > 0, there exists a constant 6 = 5{q,e) such that 
for all n, if Xn < S then 

\xn+l - dX'^Xnl < eXn- 

The following lemma ensures that the decrease from x^ to Xn+i is never 
too large. 

Lemma 2.9. For any k > there exists a constant 7 = 7(5, k, d) > such 
that for all n when k < |A|, 

Xn+l > IXn- 

Proof. For a configuration A on n S{n + 1) define 

/*+i(i, A) = P{ap = i\ai{n + 1) = ^) ; 
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that is the probabihty the root is in state 1 given the configuration on the 
leaves in T„j n 5(n + 1). Now 



/n+l(*) ^) 



e/^/„(l,A) + E,^,/„(/,^) 



1 + A<?(/„(1,^)-|) 



and so 

The estimator that chooses a state with probabihty f*_^_i{i, cri(n)) correctly 

reconstructs the root with probability ^ + X'^Xn- Since this probability must 

be less than the MLE it follows that 

2 1 1/2 1 

A x„ + - < pn+i < x'-^ + -. 
q ^ q 

and so > A'^x^ > k^x^ for an value of j;„. Now when x„ < (5 by Lemma 
[2:81 it follows that 

Combining these results completes the proof. ■ 

2.4.1. Concentration. We will establish some concentration results which 
will be required in order to make the approximation 

Zi 1 

The first lemma establishes a technical uniqueness result where the set of 
vertices which can be conditioned is limited to a set of k vertices. 

Lemma 2.10. For any e > and positive integer k there exists A = 
A{q, d, e, k) not depending on A such that for any collection of vertices vi, . . . ,Vk G 
5(A), 

sup P [ap = i\ay. = < j < k) < e. 

i,il,...,ik&C q 

Proof. This lemma simply says that fixing the spins at k distant vertices a 
long way from the root has only a small effect on the root. We note that 

^' ^ [| — |A* otherwise, 

and so since X^d < 1, 
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Let 7 be an integer sufficiently large such that 

< 1 + e. 



Fix an integer A such that A > kj. Now choose any vi,...,Vk € S{A) 
with d{vi,p) = A. For < £ < A define ae to be the number of vertices 
distance i from the root with a decedent in the set {vi, . . . ,Vk}, that is 
ai = i^{v G S{£) : \Ty n {vi, . . . ,Vk}\ > 0}. Then ao = l,aA = k and 
the ai are increasing and integer valued. Therefore there must be some i 
such that ae = a^+-y. Let wi, . . . ,Wa( denote the vertices in the set {v G 
S{i) : \Ty n {f 1, . . . , Vk}\ > 0} and wi, . . . , Wa^ denote the vertices in the set 
{v G S{i + 7) : {Tyf] {vi, . . . ,Vk}\ > 0} such that Wj is the descendent of Wj. 
By the Markov random field property the ayj. are conditionally independent 
given the ayj. . The distribution of a^j given a^j is 

P{(rwj = i2Wwj = ii) = M7^_.^. 

By Bayes Rule and the Markov random field property we have that for any 
i,i ,ii, . . . , € C, 

P[ap = i\ a^. = ij, l<j< a() 

P{ap = i'\aw. =ij,l<j < ae) 

_ P{awj = ij, l< j < ae\crp = i) 
P{(^wj = i'j, 1 < i < aeWp = V) 

_ ^hi,...,ha,&cPi^j (^wj = ijl^j (Tw^ = hj)P{\/j ajs^ = hj\(Tp = ?;) 
E/xi,...,fe„,6C Pi^j "^^j = = hj)P{yj (Tw, = hj\ap = i') 

J2hu-,ha,&c ^(^w, = hj, 1 < J < ae\ap = i) l\T=i ■ 



^hu...,ha,ec Pi^T,, = hj, l<j< ae\ap = i') Y[]U M^^,^ 
Efti,...,ft„,gC P{<^w, = hj, l<j< ae\ap = i){\+ d-^/^y 



Eh,,...,h.,^c Pi^^, = hj, l<j< ae\ap = i') (i - 



(i-d-./^) 

< l + £. 



so it follows that 



P{ap = i\au,. = ij, 1 < j < ae) < ^(1 + e) 
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and 



11 1. X 

ij,l<j < ae) > -rr—^ > 7(l-e)- 



ql + e 

By the Markov random field property since cjp is conditionally independent 
of the collection cj^^, . . . , cr^j, given the spins a^i, ■ ■ ■ , (Twa it follows that, 



sup 



P an 



„l<i<A:)-- 



< sup 



P (dp = i\au,j 



ij, 1 < j < ae) 



< e 



which completes the result. 



The next lemma establishes concentration of the posterior distributions 
when Xn is small. 

Lemma 2.11. For any e,a,K > there exists C = C{q,d,e,a, k) and 
N = N{q,d,e,a, K.) such that for any A with k < |A| < d^^^"^ and for 
n> N, 

1 



P 



>e] <Cx^^ 



Proof. The conclusion is trivially true is both C and large so we will 

suppose that Xn is small. Fix k an integer such that k > a. Choose A large 
enough so that the conclusion of Lemma |2 . 101 holds with bound e/2 and set 
N = A. Let vi, . . . ,f|5'(A)| denote the vertices in S{A). Let al{n + 1) denote 
the spins of the vertices in n S{n + 1) and define 

Wii,v)= fn-Aii,alin + l)) 

which is the conditional probability that is in state i given the boundary 
condition C7^(n). Conditional on ct^(A), the spins of 5(A), the W{i,v) are 
distributed as 

/x+(n + l-A) al=i, 

\x-(n + l-A) al^i. 

Conditional on ct(A) the vectors v), . . . , W{q, v)) are conditionally in- 

dependent for different v G 'S'(A). Using the recursion of equation (j2.2p a 
posterior probability of a vertex can be written as a function of the posterior 
probabilities of its children so there exists a function g\{W) such that, 

^^^ = /„(l,cT^(n+l)) = <7A(>V) 

where W denotes the vector 

W = {W{1, vi),..., W{1, v\siA)\), W{2, vi), W{q, v\siA)\)) ■ 



W{i,v) 
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When Xn is small we expect most of the W{i,v) to be close to |. If all the 
entries in W are identically - then gx(VV) = -. It follows by Lemma 12.101 
that if there are at most k vertices v £ S{A) such that for some 1 < i < q, 
W{i,v) / I then 



5a(W) 



< e/2. 



Observe that gx is a continuous function of each of the elements of the vector 
yy and of A. It follows that there exists a 5 > such that if W satisfies 



i^{v£ SiA) : max 

l<i<q 



W{i,v) 



>6\ <k 



then 



5a(W) 



< e. 



As the random variables maxi<j<g 



W{i,v) — i are independent since they 
are conditionally independent given cr(A) and by the symmetry of the model 
they do not in fact depend on the spins in 5(A). By Chebyshev's inequality 
and Lemma 12.21 we have that 



P I max 

A<i<q 



< P 



< 5~ 



q 



W{i,v) - 
X+(n + l- A) 



> 6 
1 



>6]+iq-l)P 



E{X+{n + l- A) 



X-{n + l-A) 



-f + {q-l)E{X~{n+l-A)--f 



> S 



As noted above we may suppose that x„ is very small so these events are 
rare. In particular we have that 



Zi 



1 



Z]j=i Zi 



> £] <P{#i max 

l<j<g 



W{i,wj) 



>6\>k 



<P(Binom(|5(A)|,^x„_A) >k 



n+l-A 



< Cx. 



a 



where the third inequality holds for large enough C and the final inequal- 
ity follows by Lemma 12.91 which completes the proof. Only in this final 
inequality do we use the assumption that k < |A|. 
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To establish the necessary concentration results we will make use of Ben- 
net's inequality which is stated below (see e.g. [19] (Appendix B, Lemma 
4). 

Lemma 2.12. For independent mean random variables Wi, . . . , Wn sat- 

n 

isfying Wi< M,bl= Yl E{W^). Then for any r]>0, 
1=1 

where 9{x) = (1 + x) log(l + x) — x. 

The following concentration result holds uniformly provided A is small 
enough. It is necessary in taking limits for large d. 

Lemma 2.13. For any < e < 1 and a > 1 there exists C = C{q,e,a) and 
N = N{q,e,a) depending only on q, a and e such that whenever \X\q < ^ 
and 

IX., .2 9 . max{-log(l -g),log(l +g)} 
\\\q + Xq < 

then for 1 < i < q and n > N, 

P{\Zi{n) - 1| > e) < Cx^. 

Proof. Observe that the hypothesis only holds when |A| is small, that is the 
interactions are weak enough. Let 

^ ^ max{-log(l -g),log(l + £)} 
4a 

By taking C large enough we can assume that 

Xn < y min{-log(l - g),log(l +g)}, 

since otherwise the conclusion is trivial. 

Since 1 — 2y < < 1 when < y < ^ and 1 — 2y > > 1 when 
— ^ ^ y <^ by integrating it follows that when \y\ < ^, 

y-y^< log(l + y)<y. (2.22) 

Taking y = Xq{Yij - i) then, 

-M < -\Xq\ - X\^ < Xq{Yij - ^) - X^q\Y,, - -^f < log(l + Xq{Yij - i)), 
and 

log(l + XqiYij - 1)) < Xq{Y,, - 1) < \Xq\ < M. 



Let 



Wj = Xq{Yy--)-X''q\Y,,--f 
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and so by Lemma 12.51 

EWj = X^qxn - X^q^Zn < \\\^qXn 

and -{Wj - EWj) < M + \X\^q < 2M. Also EWj = X^qXn - X^q^Zn > 
— \X\^q'^Xn so dEWj > —q^Xn- Since by definition, < Yij < 1, our as- 
sumption that \X\q < ^ implies that \Xq{Yij — |)| < ^- From the inequality 
(a + 6)2 < + 26^ and Lemma [23] it follows that 

E{Wj-EWjf < EWj < 2E (^Xq{Yij - +2E (^Xq{Yij - < AX^q'^Xn- 

and so if S = Y.'j=i ^(^j - EWj f then B < AdX'^q^Xn < Vx„ since 
dX'^ <1. Now 



P{Zi <l-e) 



p|^X^log(l + Ag(yy-i)) 



-) < log(l - e) 



<P\^W,<log{l-e) 
/ d 



< 



exp 



B /(-ilog(l-£))2M^ 



4M2 



B 



(2.23) 



where the first inequality follows from the equation (j2.22p . the second from 
the fact that dEWj > —q^Xn, the third from our assumption that Xn < 

2 

^ max{— log(l — e), log(l + e)} and the final inequality by applying Lemma 

Since ^0{x) is increasing in x the right hand side of equation (I2.23P is 
increasing in B and hence substituting B < Aq'^Xn gives, 

Va^n^ /-log(l-e)M^ 



P (Zi < 1 - e) < exp 

< exp 

< exp 

< CxZ. 



4M2 



Aq^Xf 



loEfl - e) f^^^ |^ -log(l -£)M I _ ^ 



4M 

log(l-g) 
4M 



Aq^Xn 



log(l-e) 
AM 



(2.24) 
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where the second inequahty uses the fact that 6{x) < x(log(x) — 1). With 
essentially the same argument we have P{Zi > 1 + e) < Cx". Furthermore 
the result holds similarly for the other Zi as well which completes the result. 



Combining the results of this section the following corollary gives us the 
concentration result we need. 

Corollary 2.14. For any < e < 1 and a > 1 there exists C = C{q,e,a) 
and N = N{q, e, a) depending only on q, a and e such that for 1 < i < q 
and n > N , 

Zi__l 

q 



p 



>e \ < Cx^. 



(2.25) 



Tyi=l Zi 

Proof. In light of Lemmas 12.111 and Lemma [2.13l we split the result into two 
cases, when |A| is big and small. Let e'iq) > be small enough so that if for 
alH, \Zi — 1\ < e' then 

Zi__l 

q 



Yh=i Zi 



and let 



M 



max{— log(l — e'), log(l + e')} 



4a 



For each fixed d define 



1 



/C, = {A:|A|g<^,|A|g + AV<M}, 

an open set which includes 0. Let Jd = [—d'^^"^, d^^"^] \ KLd- 

By Lemma [2.131 equation (j2.25p holds with a bound C = C'{q,e,a) not 
depending on A or d, provided A E /C^. For each fixed d Lemma [2 . 1 1 1 implies 
that equation (j2.25p holds with a bound C'^ = C'J^{q, e, a) not depending on 
A, provided A G J'd- Since X^d < 1, for large enough d so that d > 4q^ and 
d^^l'^q + d^^c^ < M the set Jd is empty. It follows that equation (j2.25p 
holds with a bound 

C = max I C', max C'^, 



that is independent of A and d. 



2.5. Bound on — -Xn- In this section we bound the term Zn — -x„ when 
Xn is small. 

Lemma 2.15. For any e,K > there exists a 5 = 6{q,K,d) and k = 
k{q,K,d) such that if Xn < S and |A| > k then 



1 



< £. 



Xn+k Q 
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Zn+1 — E — 



E 



+ 



1 [Zl-q Ei=l Z, 

9^ clT^ 



(2.26) 



Expanding and using Lemma 12.61 and Lemma 12.51 we get that 

2 



dX^ [ (1 - A)-x„ + Xzn 
q 



Similarly 



< CqX^ 



and 



e[ -qA <Cqxl 



Substituting these bounds into equation (I2.26P and noting that 



{zi - \ Eti z^ 



{El=l Z^f 



< 1 



so we have that 



Zn+i - (1 - \)-Xn + \Zn 



Dividing by Xn+i we get 



(1_A)1 + A^ 

Q Xn 



Xn+1 



By Lemma ES] we have that < 7"^ and by equation (I2.19P I ^^^^ - 1| < 
C'J'^. It follows that 

1 Xn+l 



Xn+1 



V Q XnJ 



(2.27) 
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Iterating this equation we get that 



k 
1=1 



q Xn 



'1 _ x''-^)- + )^k-e^n±l. — (1 — A'^"^"'""^)- — _\^~^+^ ^n+e-i 



< 



k-i 



Xn+l 



q Xn+l-l 



Xn+£-l 



S '^q |A| Xn+l-l- 



(2.28) 



Iteratively applying Lemma 12.81 implies that if 5 > is small enough and 
Xn < S then for < ^ < A;, Xn+i < 25. Since < z„ < a;„ it follows from 
equation (j2.28p that 



^n+k 1 



Xn+k 



By taking k sufficiently large and 6 sufficiently small we complete the result. 



Corollary 2.16. For any e,K > there exists a 5 
k{q,K,d) such that if Xn < 6, n> k and |A| > k then 



6{q, K, d) and k 



< £. 



Proof. By Lemma [2. 9 1 if x„ < 5 then x^-k < 7 ^Xn and so the result follows 
by Lemma 12.151 



3. Reconstruction for (7 > 5 

The lemmas proved in Subsections 12.41 and 12.51 establish Condition 12.71 
We now use these results to establish the change from Xn to Xn+i when Xn 
is small. 

Lemma 3.1. There exists a 6 = 6{q) > and N = N{q) such that if Xn < 5 
and n> N then 

2 , I d{d - I) q{q - A) 2 

Xn+l >dXXn + ^A X„. 

2 2 <? — 1 
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Proof. Let e > 0. Then 
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E 



<eE 



i((ELi^^)- 



E 



1 ((ELi 



1 ((ELi z. 



<eE 



+ P 



> e 



^1 



I]j=i Zi 



+ I E 



m=iz. 



1/2 



1 ((ELi 



(3.1) 



g 9^ 

where the second inequahty comes from the Cauchy-Schwartz inequahty and 
the third follows by Corollary 12.141 provided that n is sufficiently large while 
the fourth inequality follows by Lemma 12.61 

Now by substituting equations (|2.16p . (|2.17p and (j2.18p we have that 



j^Z, _ ^ Z,{{ZUZ,)-q) ^ ^l ((ELl^O 



d{d - 1) 



2g(g - 2) 2 

q-l 



^— ^((g - 3 + X)xn - \qznf - — ^ ((g - 3 + A)x„ - \qznf 



+ 



1 



q{q-\\q-2) 



q{q-^) 



((3g - 6 - 2A)x„ + 2Agz„)^ 



>i + dA^x„ + ^^fci^A^x^ 



^ _ 1 

Xn q 



where \R\ < Cx^ and C and C depend only on q. Let k — 
|A| < At then since < Zn < Xn, 



(3.2) 

s(2zlL then if 



^, d{d-l) ^^ 



Zfi 1 
a^n q 



< 



£n _ 1 

Xn q 

I did - 1) q{q - A) 2 



3 2 g- 1 



-AV 



(3.3) 
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When d > then we always have |A| < k because dX'^ < 1. For the finite 
number of cases when d < k? hy taking 5 to be sufficiently small and to 
be sufficiently large we may assume by Corollary 12.161 that when |A| > n 
and n > N then 

Zn 1 
< K. 

Xn q 

It follows that we may take equation (13. 3p to hold for all d and A. 

Now combining equations (I2.15p . (13. ip . ()3.2p and (13. 3p and taking 5 and 
e to be sufficiently small and sufficiently large we complete the result. 



Proof. (Theorem II. 2p 

We will prove the result for the ferromagnetic case, the anti- ferromagnetic 
case will follow similarly. We will establish that when A is close enough to 
d"^^"^ then x„ does not converge to 0. First we will verify that x„ does 
not drop from a very large value to a very small one. Fix some k < d~^/'^. 
By Lemma 12.91 there exists < 7 < 1 such that if k < A < d"^/^ then 
Xn+i ^ 73^n- Now we usc Lemma |3. 11 We can take 5 > and N so that if 
n> N and x„ < (5 then 

x„,,>dAV + ^^^^A^x^ (3.4) 

Let e = min{i7^+^, ^7} > 0. Since g — 4 > we can choose k < \ < d^^/"^ 
such that 

KdX^+'-'-^'-^X^e. (3.5) 
- 2 2 g - 1 ^ ' 

We now show by induction that for all n that x„ > e. Since xq = 1 — | > \, 
then Xn > ^7" > £ when n < N so suppose that n > N . Now if x„ > £^~^ 
then Xn+i > > £■ If e < x„ < 7~^e < 5 then by Lemma 13.11 and 
equation (|3.5p we have that, 

- V 2 2 q-l 

It follows by induction that for all n, Xn> s which implies that A^ < A < 
which establishes that the Kesten-Stigum bound is not tight. 
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4. Large degree asymptotics 

In this section we will analyse what happens as we let d grow. As d 
increases the interactions become weaker and A decreases. We will param- 
terize the interaction strengths with A defined by A = X{d) = Xd^^"^. With 
this parameterisation A = 1 corresponds to the Kesten-Stigum bound in the 
ferromagnetic case while A = — 1 corresponds to the Kesten-Stigum bound 
in the antiferromagnetic case. We will, therefore, restrict our attention to 
|A| < 1. We define 

Uij = log (^1 + Xq{Yij - 

and denote Uj = {Uij, . . . , Ugj) G M'^. We have the following estimates on 
the means and covariances of the Uij. 

Lemma 4.1. There exists constants C and d' depending only on q such that 
when d> d' , 

1 



dEUij - -X^qxr. 



< Cd-^l\ 



(4.1) 



and for i > 2, 



1 1 - 

dEUij + (2 + —^)>^'^qxr. 



< Cd-^l\ 



(4.2) 



For any I < i < q, 

dVar{Ui) — X^qxr 
and for and 1 < ii < ^2 < 9, 



< Cd~'^^ 



(4.3) 



dCov{Ui^j, Ui^j) + 



q-1 



X qxn 



< cd-^'\ 



(4.4) 



Proof. Using the Taylor series expansion of log(l-l-w), there exists a constant 
> such that when \w\ < W then | log(l + w) — w + ^w'^\ < \w\^ . Since 
by definition < Yij < 1 by taking d' to be sufficiently large we may assume 
that \Xq{Yij - i)| < \X\q < W since |A| < d^^/"^. Then by LemmaESJ 



E 



U- 



i^-HY^j--) + lxViY^,--? 



< E\XfQ^\Yi 



1, 



< d-'-^/^q^ElYii - -|3 
q 



< q^d-^/\ 



(4.5) 
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Now since by Lemma 12.2^ < 2;„ < < 1 and applying the identities of 
Lemma 12.5^ 



q 



= ^|A|V 

1 9 , 3 



1 



XZn + -{l 

q 



\>?qxr. 



(4.6) 



Combining equation (j4.5p and (j4.6p establishes equation (|4.ip . Equations 
r2]) . (fO|) and (fO]) follow similarly. ■ 



Since the random vectors Yj = (Yij , . . . , Yqj ) are independent and identi- 
cally distributed so are the Uj = {Uij, . . . , Ugj) for j = 1, . . . ,d. Also each 



Uij satisfies 



Uij\ < max{log(l + d-^/^q), \ log(l - d-^/^q)\} 



as d ^ oo. Such a collection of random vectors suggests the use of a central 
limit theorem. 

The following standard proposition can be establshed using the Central 
Limit Theorem and Gaussian approximation. 

Proposition 4.2. Let ^ : M"? M 6e a differentiable bounded function 
and let e > 0. Let Vi . . . , Vo be a sequence of iid q- dimensional vectors 
denoted Vj = {Vij, . . . ,Vgj). Let ^ £ W be a vector and let E G W^'i 
be a positive semi-definite symmetric q x q-matrix. Let {Wi,...^Wq) be 
distributed according to the q-dimensional Gaussian vector N(iJ,,T,) . 

Suppose there exists some C > such that for 1 < i < j < q the following 
holds: \\fJ-i\\oo < C, ||S 



ij \ \oo 



< C, ll/i - DEViWoo < CD-^/'^ and ||S - 
II • I loo denotes the standard L°^ norm. Then 



DCoviVi)\\oo < CD-^I^ and 
there exists a D' depending only on q,C and ip such that if D > D' then 



i=l 



< e 



Let n be the g-dimensional vector given by 
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and let S is the q x g-covariance matrix given by 
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Q « = J, 



Define 



'lp{wi, ...,Wq) 



The function %[) is positive, analytic and bounded by 1. Now if (W^i, . . . , Wq) 



is a Gaussian vector distributed according to A^(0, S) then {s^l+^/sWl, 
y/sWq) is distributed according to N{sfi,sT,). We define 



,SHq + 



g{s) = gq{s) = E7p{sfJ,l + ^/sWl, ...,Sflq + \/sWq) 



1 

q 



(4.7) 



Since Zi = exp(^^^j^ Uij) we have that 

Zi 1 
2.1=1 A q .^^ .^^ 

Then Proposition l4.2l and Lemma [4.11 immediately imply the following lemma. 

Lemma 4.3. For each e > there exists a dl such that when d > d' , 



1 

q' 



Xn+l - g{>?Xn) 



< £. 



Understanding the function gq{s), and in particular the solutions to the 
equation gq{s) = s, provides key information into the reconstruction problem 
when d is large. Since < Xn < we will restrict our attention on g to 
this interval. 



Lemma 4.4. For each q, the function gq is continuously differentiahle on 
<? 

Proof. Since 



the interval (0, ^— i] and increasing. 



sup 



d 



dx 1 + 

we have that when s > 0, 



sup 



(l + e^)2 



E 



-^i^isfJ-l + \^Wl, . . . , S/ig + ^/sWq] 



< 



1 " 



1=1 



— S/ij + ^/sWi 

ds 



(4.8) 



< oo 



which establishes that gq is differentiahle. Now let {Wi, W2, . . . , Wq) be an 
independent copy of {Wi, . . . ,Wq). Then when < s' < s the following 
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equality in distribution holds 

d 



VI{Wi,...,Wq)^V^{Wi,...,Wq) 



Recall that if W is distributed as A^(/x, s^) then Ee^ = e''+^^^ For 2 < i < 
q, since — Wi is distributed as N{0, 2q + ■^), 

E [exp (y^{Wi - Wi) + ^^I^{Wi - Wi)) | 
= exp - Wi) + {s- s'){q + . 

Noting that is convex, by Jensen's inequality 

gq{s) = + VsWi, ...,siig + VsWq) - ^ 

1 

= E 



1 

q 



> E- 



1 + E 



E- 



1 + EL2 exp + ^) + - Wi)) 1 

= 9,{s') 

which establishes that gq{s) is increasing. 
Lemma 4.5. For all q and small s, we have that 

and so when q > 5 there is a root < s* < to the equation g{s*) = s*. 
Proof. Using the identity 



(4.9) 



a 

r + s 



i-l 



s"* r + s 
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and taking a = exp(s/ni+v^W"i), s = q and r = {Yli=i exp(s/Xj + ■\/sWi) — q) 
we have that 



9q{s) = Etpism + ^/sWl, ...,SHq + \^Wg) - ^ 



, (ELi exp(s//i + y/sWi) - qf exp(s/y,i + y^Wi) 1 



+ E- 



q'^ Yl=i^^v{sii., + ^/sWi) q' 

(4.10) 



Now again using the fact that if W is distributed as N{fi, s^) then Ee^ = 
g/i+s /2 doing Taylor series expansions with the help of Mathematica we 
have that 



E ^( 1)'-^ ^^i=i 6^P(^M^ + VsWj) - qf ^ exp(s;m + ^/sWx) 
i=l ^ 

( 95(9-10) in„„ Q „ o 95(39-5) o 95(39-5) 

Aqe^'i' + &e~^=^ -e^^'i' + ^e^^q^ -?,e «-i ^ + 3e «-i 

O 95(9-5) „ „ Q O 95(29-5) 95(9-10) 95(9-6) 

-6e 9-1 _ g3 _ g^2g3qs g_i2e 9-1 

95(9-10) r, 95(9-3) o 95(9-3) 95(9-6) 95(9-6) 

-e 9-1 q'^ + Ae ^-i g^-4e ^-i g + 4e ^-i g'^ + Se «-i g 

2 95(2 9-5) 9s(-10+3 9) 2 2 2 95(9-3) 

— 4e 9-1 q — 3e 1-'^ q — 3e g +9e g + Ge 9-1 gi 



95(9-3) 95(-10+3 9) „ o (9-2)59 95(9-10) „ 95(-10+3 9) \ 

-eg'^e -6e 9-1 +4g''e^*-8e 9-1 g + 6e 9-1 g^ + 9e 9-1 gl 

g 2 g-1 6 
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and 

E.(Ei=i exp(s/ii + ^/sWi) - 

¥ 

496^9^ + 606 -6^°«' + 16e^ 9^-56^ ^-l g 

+ 46 — 6g 6 9-1 —126 — e i~^q — 356 i-^q 

fi _2£_ in c gs n qs(3 1;-5) i n 

- 24 6 ^~q + 50e ^^^"5 + 44 6 ^~q^ + 5e 9-1 +106 ^'^"^^ 



g3(q-5) ^ ^ ^ ^ qs o 93(2 g-5) 93(9-10) 

9-1 

93(9-6) 93(9-10) r, 93(9-3) „ o 93(9-3) 93(9-6) 



o 93W-aj , r, /I __2£_ 9 

- 306 9-1 -g4_6g2g3gs_^4^4g ,_i_^;l06 9-1 -llOe 9-1 q 



- 72 6 9-1 q^-We 9-1 g3_^]^2 6 9-1 g2_i2e^ 9-1 g + 246 9-1 q^ 

93(9-6) 9 93(2 9-5) 93(-10+3 9) „ 93(9-5) „ 9 93(9-5) 

+ 48 6 9-1 g- 106 9-1 g- 106 9-1 0^^- 156 9-1 g2 + 456 9-1 q 

93(9-3) 93(9-3) 1 n 93 c 93 „ o qs 

+ 186 9-1 q^-l8q^e 9-1 - 246 ^9-1 - 246 9-ig'^ + 18g^6 9-1 

93(-10+3 9) 93 „ -3 (9-2)39 

- 20 6 9-1 -Aq-^e i-^ + Aq-^e'^'' - W 9-1 g 

93(9-10) „ 93(-10+3 9) 

+ 60 6 9-1 q^ + me 9-1 , 

= 

Since < (SLiCxp(s^ +v/iw,)-,)^ p ^ exp(.;.i+v^m) < ^ eombining 

these estimates establishes equation ()4.9p . 

Since — 4 > when g > 5 for small s > we have that gq{s) > s. Since 

gg{l --)= Elp{sfll + ^/sWl, ...,SHq + VsWq) - - < 1 - - 

q Q q 



by the Intermediate Value Theorem there must be some < s* < «-i such 
that g{s*) = s*. 



Theorem 4.6. When q > 5 define 

w* = m{{w : 30 < s* < -,g{ws*) = s*}. 

Then < w* < 1 and for each 5 > there exists a d'(q,5) such that if 
d > d' then the model has reconstruction when > w* + 6 but does not 
have reconstruction when < w* — 6. 

Proof. The key idea of this result is that when > w*, gq{Xs) has a non- 
zero attractive fixed point as a function of s while if A < then gq{Xs) < s 
for s > 0. By Lemma [4. 51 we have the expansion gq{s) = s + ^^^^^,5^ + 0(5^) 
so for small s, gq{s) > s. It also implies that for any < w < 1, the set 
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{0 < s < : gq{ws) > s} is a compact set bounded away from 0. By the 
continuity of Qg, 

< s < - — - : g{w*s) = s \ = |o<s< - — - : g{ws) > s 



W*<W<1 

and by the Finite Intersection Property of compact sets it is nonempty and 
compact so let s* G {0 < s < ^y!- : g{w*s) = s}. 
Now set = w* + 6 and so 

+ = 9,is*w*) = s*> s*^^ 

■w* + w* + 

Take d large enough so that Lemma 14.31 holds with < e < s* — s* y^_^_g ■ 
Then when x„ > s* y^_^_g since gg is monotone it follows that 

Xn+l > gq{{w* + S)Xn) - E 



w* + 5 w* + 5 

w 



s 



w* + 5 

and hence inf x„ > s* — ttt which establishes reconstruction. 



By equation (IXTU|) 



^ d{d-l) , 2 



where Cg does not depend on d or A. So when |A| < 1 and if Xn < then 



2 2 

\2 



Xn+l ^ A Xn + CqXn ^ A X„ H X„ < X„. 



When A^ < w* then g{)?s) < and so by Lemma 14.31 for large enough d, 

we have that for some n, x„ < ^^c ■ follows then that x„ converges to 
which proves non-reconstruction for large enough d. ■ 

4.1. Non-reconstruction for q = 2>. 

Lemma 4.7. When g = 3 for all < s < then gq{s) < s. 

We defer this proof to the appendix. 

Lemma 4.8. When q = 3 there exists a 5 > and N not depending on d 
or A such that if Xn < <5 and n > N then 

2 3d{d- 1) 4 2 

Xn+l <dXXn- A X„. 

The proof is essentially identical to the proof of Lemma 13.11 and so we 
omit it. 
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Proof. (Theorem ll.ip 

At the Kesten-Stigum bound we have that |A| = 1. Since g{s) < s for 
all s > by Lemma 14.31 there exists a d' such that when d > d! and m is 
sufficiently large then Xm < ^ where 5 is the constant in Lemma 14.81 It 
follows from Lemma 14.81 that if for some m, Xm < <5 then lim„ Xn = and 
hence non-reconstruction. 
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Appendix A. Deferred Proof 



Proof. (Lemma I4.7p 

Recall that // is the g-dimensional vector given by 



^ i = l 

1 I 1 



and that S is the q x (^-covariance matrix given by 



<2 + ird ^/2, 



^ = J, 



With iWi, . . . , Wq) a Gaussian vector distributed according to iV(0, E) the 
function gg{s) is defined as 

gq{s) = E'4j{sHi + ^fsWl, ...,SljLq + ^/sWq) - ^. 

where 

Ipiwi, ...,Wq) 
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In this lemma we consider the case of g = 3. By equation 
for any x, y, 



8\j we have that 



1 + 1 + ef 



1, 

< - y\ 



1 



1 



1 + l + ey 



<^\x-y\ 



Using this estimate and the fact that = J ^ it follows that 



153 



1 ^ 

(si) - 53(52)1 <jYl - S2)| + l^/sT - Vs^\ E\Wi\ 



i=l 



Now max^gjQ 2, -^x^/"^ = ^-/lO- Hence if we take 0.1 < si < S2 < | then 



15 / 135 

l<?3(si) -53(52)1 < (y + V 16^)1''^ """^l <3|si-S2|. (A.l) 



Let 



S - i2L 667 I 

1000' lOOO''"' 1000 J 



and suppose that 



1000 



(A.2) 



Now fix some s E [0.1, |]. Then for some s* G 5, |s — s*\ < which 
implies that 



53(5) - s < 53(5*) - s* + \g3{s) - gsis*)] + \s - s* 
5 



< 



1000 



+ 4 s - s*\ + \s- s* 



< 



where the second inequality follows from equation (jA.ip . So proving equa- 
tion ()A.2p would imply that 53(5) < s for all 0.1 < s < |. We do this by a 
rigorous method of numerical integration. 

Let Ui,U2 be independent standard Gaussians. The random vectors 
(W2 - Wi,W3 - Wi) and {3Ui,^Ui + ^U2) have the same covariance 
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matrix and therefore are equal in distribution. Hence 

1 1 



gsis) = E- 



E 



1 1 



1 + exp (-f + 3V^C/i) + exp (-f + l^/^Ui + ^^f/s 



3 



1 + exp (-f + S^^x) + exp (^-f + ^^/sx + ^^/sy 
exp(-xV2 - yV2) , , 1 



dx dv 

< 



5J-5 



1 + exp (-f + 3Vix) + exp ( -f + f^^x + ^V^y 



_exp(-xV2-,V2) 1^10-5 
27r 3 

where the inequahty uses the standard inequality that 



f 

J X 



exp(-xV2) , exp(-xV2) 
dx < 



IX \/27r x\p2j{ 
which implies that 



// 



exp(-xV2 - 2/V2) < ^ exp(-5V2) ^ 



/m2\[-5,5]2 27r 5\/27r 

Define the function (^{i) = min{|i|, \i + 1|}. Then for integers i and j, 

M5" /■ 2W exp(— — y^/2) dx 

1 + exp (-f + 'i^sx) + exp (-f + f^^x + ^^iy)) 27r 



200 " 200 



< 



^^P(-(li)V2- (1^)72)40000-1 



1 + exp (-f + 3Vi2So) + (-f + + ¥V^2So)) 2^' 

(A.4) 

Let tpiijj) denote the right hand-side of equation ()A.4p . Substituting this 
bound in (|A.3p we have that 



^ 999 999 

i=-1000 j=-1000 



The right hand side of equation (|A.5p is merely a combination of basic 
arithmetic operations and exponentials and so can be rigorously computed to 
arbitrarily high precision (e.g. in Mathematica) . Evaluating this expression 
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for each s* G S estabhshes equation ()A.2j) . As noted above this imphes that 
g{s) < s when s G [0.1, |]. 

It remains to show that 53(5) < s when < s < 0.1. Using equation 
(|4.10|) and noting that 

exp(s;Ui + y/sWi] 



< 1 



we have that 



[I2i=i exp(s/Ui + y/sWi) - 3j exp(s/xi + y/sWi 



i=l 



81 3 



Using the fact that if W is distributed as N{fi,a'^) then Ee 
have after simphfying that 
, , , 74 4 _a./o 4 

53 



w 



202 3./2, ,i 
27 27' ' 27" 81 27 81 

By Taylor's Theorem we have that if |x| < 1.2 then 



16 



is) < i;-;^e-^^/^+^e3^-^e-3^/^+^e-«^+^ei2^-^e^^/^ (A.6) 



we 



27 



exp 



i=0 



< — max 
6! ye[-i.2,i.2] 



d^ey 



dy^ 



< 2- 



6! 



Applying this to equation (IA.6|) we get that when < s < 0.1 that 



gsis) - s < 



1280 



where 



h{s) 



-960 - 1440s + 58860s^ + 98334s^ + 595795s''. 



Now h{s) is convex and /i(0) < and /i(O.l) < which imples that /i(s) < 
for all < s < 0.1. It follows that 5(3 (s) < s for all < s < 0.1 which 
completes the proof. 
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